26 research outputs found

    Efficient Analysis of Complex Diagrams using Constraint-Based Parsing

    Full text link
    This paper describes substantial advances in the analysis (parsing) of diagrams using constraint grammars. The addition of set types to the grammar and spatial indexing of the data make it possible to efficiently parse real diagrams of substantial complexity. The system is probably the first to demonstrate efficient diagram parsing using grammars that easily be retargeted to other domains. The work assumes that the diagrams are available as a flat collection of graphics primitives: lines, polygons, circles, Bezier curves and text. This is appropriate for future electronic documents or for vectorized diagrams converted from scanned images. The classes of diagrams that we have analyzed include x,y data graphs and genetic diagrams drawn from the biological literature, as well as finite state automata diagrams (states and arcs). As an example, parsing a four-part data graph composed of 133 primitives required 35 sec using Macintosh Common Lisp on a Macintosh Quadra 700.Comment: 9 pages, Postscript, no fonts, compressed, uuencoded. Composed in MSWord 5.1a for the Mac. To appear in ICDAR '95. Other versions at ftp://ftp.ccs.neu.edu/pub/people/futrell

    Full Text and Figure Display Improves Bioscience Literature Search

    Get PDF
    When reading bioscience journal articles, many researchers focus attention on the figures and their captions. This observation led to the development of the BioText literature search engine [1], a freely available Web-based application that allows biologists to search over the contents of Open Access Journals, and see figures from the articles displayed directly in the search results. This article presents a qualitative assessment of this system in the form of a usability study with 20 biologist participants using and commenting on the system. 19 out of 20 participants expressed a desire to use a bioscience literature search engine that displays articles' figures alongside the full text search results. 15 out of 20 participants said they would use a caption search and figure display interface either frequently or sometimes, while 4 said rarely and 1 said undecided. 10 out of 20 participants said they would use a tool for searching the text of tables and their captions either frequently or sometimes, while 7 said they would use it rarely if at all, 2 said they would never use it, and 1 was undecided. This study found evidence, supporting results of an earlier study, that bioscience literature search systems such as PubMed should show figures from articles alongside search results. It also found evidence that full text and captions should be searched along with the article title, metadata, and abstract. Finally, for a subset of users and information needs, allowing for explicit search within captions for figures and tables is a useful function, but it is not entirely clear how to cleanly integrate this within a more general literature search interface. Such a facility supports Open Access publishing efforts, as it requires access to full text of documents and the lifting of restrictions in order to show figures in the search interface

    C-ME: A 3D Community-Based, Real-Time Collaboration Tool for Scientific Research and Training

    Get PDF
    The need for effective collaboration tools is growing as multidisciplinary proteome-wide projects and distributed research teams become more common. The resulting data is often quite disparate, stored in separate locations, and not contextually related. Collaborative Molecular Modeling Environment (C-ME) is an interactive community-based collaboration system that allows researchers to organize information, visualize data on a two-dimensional (2-D) or three-dimensional (3-D) basis, and share and manage that information with collaborators in real time. C-ME stores the information in industry-standard databases that are immediately accessible by appropriate permission within the computer network directory service or anonymously across the internet through the C-ME application or through a web browser. The system addresses two important aspects of collaboration: context and information management. C-ME allows a researcher to use a 3-D atomic structure model or a 2-D image as a contextual basis on which to attach and share annotations to specific atoms or molecules or to specific regions of a 2-D image. These annotations provide additional information about the atomic structure or image data that can then be evaluated, amended or added to by other project members

    Preferred Spatial Frequencies for Human Face Processing Are Associated with Optimal Class Discrimination in the Machine

    Get PDF
    Psychophysical studies suggest that humans preferentially use a narrow band of low spatial frequencies for face recognition. Here we asked whether artificial face recognition systems have an improved recognition performance at the same spatial frequencies as humans. To this end, we estimated recognition performance over a large database of face images by computing three discriminability measures: Fisher Linear Discriminant Analysis, Non-Parametric Discriminant Analysis, and Mutual Information. In order to address frequency dependence, discriminabilities were measured as a function of (filtered) image size. All three measures revealed a maximum at the same image sizes, where the spatial frequency content corresponds to the psychophysical found frequencies. Our results therefore support the notion that the critical band of spatial frequencies for face recognition in humans and machines follows from inherent properties of face images, and that the use of these frequencies is associated with optimal face recognition performance

    Nominalization and Alternations in Biomedical Language

    Get PDF
    Background: This paper presents data on alternations in the argument structure of common domain-specific verbs and their associated verbal nominalizations in the PennBioIE corpus. Alternation is the term in theoretical linguistics for variations in the surface syntactic form of verbs, e.g. the different forms of stimulate in FSH stimulates follicular development and follicular development is stimulated by FSH. The data is used to assess the implications of alternations for biomedical text mining systems and to test the fit of the sublanguage model to biomedical texts. Methodology/Principal Findings: We examined 1,872 tokens of the ten most common domain-specific verbs or their zerorelated nouns in the PennBioIE corpus and labelled them for the presence or absence of three alternations. We then annotated the arguments of 746 tokens of the nominalizations related to these verbs and counted alternations related to the presence or absence of arguments and to the syntactic position of non-absent arguments. We found that alternations are quite common both for verbs and for nominalizations. We also found a previously undescribed alternation involving an adjectival present participle. Conclusions/Significance: We found that even in this semantically restricted domain, alternations are quite common, and alternations involving nominalizations are exceptionally diverse. Nonetheless, the sublanguage model applies to biomedica

    Ambiguity in Visual Language Theory and its Role in Diagram Parsing

    No full text
    To take advantage of the ever-increasing volume of diagrams in electronic form, it is crucial that we have methods for parsing diagrams. Once a structured, content-based description is built for a diagram, it can be indexed for search, retrieval, and use. Whenever broadcoverage grammars are built to parse a wide range of objects, whether natural language or diagrams, the grammars will overgenerate, giving multiple parses. This is the ambiguity problem. This paper discusses the types of ambiguities that can arise in diagram parsing, as well as techniques to avoid or resolve them. One class of ambiguity is attachment, e.g., the determination of what graphic object is labeled by a text item. Two classes of ambiguities are unique to diagrams: segmentation and occlusion. Examples of segmentation ambiguities include the use of a portion of a single line as an entity itself. Occlusion ambiguities can be difficult to analyze if occlusion is deliberately used to create a novel object from its components. The paper uses our context-based constraint grammars to describe the origin and resolution of ambiguities. It assumes that diagrams are available as vector graphics, not bitmaps
    corecore